Skip to content

Conversation

@pleshakov
Copy link
Contributor

Proposed changes

Previously, the Gateway would handle one event (upserting or deleting a
Kubernetes resource) at a time.

This PR introduces event batching: multiple events (a batch) are
handled at once.

Batching is needed because, because typically handling an event (or
multiple events at once) will result into reloading NGINX, which is
the operation we want to minimize, for the following reasons:
(1) A reload takes time - at least 200ms. The time depends on the size
of the configuration including the number of TLS certs, available CPU
cycles.
(2) A reload can have side-effects for the data plane traffic.

Now, when a new event comes, there are two cases:

  • If there is no event(s) currently being handled, the new event is
    handled immediately.
  • Otherwise, the new event will be saved for later handling. All saved
    events will be handled after the handling of the current event(s)
    finishes. Multiple saved events will be handled at once in one batch.

Additional implementation notes:

(a) The EventLoop was split into two parts:
(1) The EventHandler, which is only responsible for handling a batch of
events, without dealing with concurrency.
(2) The stripped-down EventLoop, which is responsible for batching events
and propagating batches to the EventHandler in a dedicated goroutine.

(b) The ChangeProcessor was fixed, so that when multiple changes are
captured (coming from a single batch) -- first, changes that cause
data plane reconfiguration, followed by changes that do not cause
reconfiguration -- in that case, the data plane will be reconfigured.
Without the fix, the latter changes would prevent data plane from
reconfiguring. Note that the bug only appeared when batching was added.

@pleshakov pleshakov requested a review from kate-osborn August 10, 2022 16:18
Copy link
Contributor

@kate-osborn kate-osborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@pleshakov pleshakov requested a review from f5yacobucci August 16, 2022 16:16
Previously, the Gateway would handle one event (upserting or deleting a
Kubernetes resource) at a time.

This commit introduces event batching: multiple events (a batch) are
handled at once.

Batching is needed because, because typically handling an event (or
multiple events at once) will result into reloading NGINX, which is
the operation we want to minimize, for the following reasons:
(1) A reload takes time - at least 200ms. The time depends on the size
of the configuration including the number of TLS certs, available CPU
cycles.
(2) A reload can have side-effects for the data plane traffic.

Now, when a new event comes, there are two cases:
- If there is no event(s) currently being handled, the new event is
handled immediately.
- Otherwise, the new event will be saved for later handling. All saved
events will be handled after the handling of the current event(s)
finishes. Multiple saved events will be handled at once in one batch.

Additional implementation notes:

(a) The EventLoop was split into two parts:
(1) The EventHandler, which is only responsible for handling a batch of
events, without dealing with concurrency.
(2) The stripped-down EventLoop, which is responsible for batching events
and propagating batches to the EventHandler in a dedicated goroutine.

(b) The ChangeProcessor was fixed, so that when multiple changes are
captured (coming from a single batch) -- first, changes that cause
data plane reconfiguration, followed by changes that do not cause
reconfiguration -- in that case, the data plane will be reconfigured.
Without the fix, the latter changes would prevent data plane from
reconfiguring. Note that the bug only appeared when batching was added.
@pleshakov pleshakov force-pushed the feature/batch-events branch from 281e342 to 9501170 Compare August 16, 2022 17:01
@pleshakov pleshakov merged commit 7f831b5 into main Aug 16, 2022
@pleshakov pleshakov deleted the feature/batch-events branch August 16, 2022 17:21
@lucacome lucacome added the enhancement New feature or request label Aug 19, 2022
miledxz added a commit to miledxz/nginx-gateway-fabric that referenced this pull request Jan 14, 2025
Previously, the Gateway would handle one event (upserting or deleting a
Kubernetes resource) at a time.

This commit introduces event batching: multiple events (a batch) are
handled at once.

Batching is needed because, because typically handling an event (or
multiple events at once) will result into reloading NGINX, which is
the operation we want to minimize, for the following reasons:
(1) A reload takes time - at least 200ms. The time depends on the size
of the configuration including the number of TLS certs, available CPU
cycles.
(2) A reload can have side-effects for the data plane traffic.

Now, when a new event comes, there are two cases:
- If there is no event(s) currently being handled, the new event is
handled immediately.
- Otherwise, the new event will be saved for later handling. All saved
events will be handled after the handling of the current event(s)
finishes. Multiple saved events will be handled at once in one batch.

Additional implementation notes:

(a) The EventLoop was split into two parts:
(1) The EventHandler, which is only responsible for handling a batch of
events, without dealing with concurrency.
(2) The stripped-down EventLoop, which is responsible for batching events
and propagating batches to the EventHandler in a dedicated goroutine.

(b) The ChangeProcessor was fixed, so that when multiple changes are
captured (coming from a single batch) -- first, changes that cause
data plane reconfiguration, followed by changes that do not cause
reconfiguration -- in that case, the data plane will be reconfigured.
Without the fix, the latter changes would prevent data plane from
reconfiguring. Note that the bug only appeared when batching was added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants